Communications Medicine
○ Springer Science and Business Media LLC
Preprints posted in the last 30 days, ranked by how well they match Communications Medicine's content profile, based on 85 papers previously published here. The average preprint has a 0.04% match score for this journal, so anything above that is already an above-average fit.
Yuan, S.; McVey, J. C.; Hartmann, K.; Abramowitz, S.; Woerner, J.; Shakt, G.; Judy, R.; Douglas, J. E.; Voight, B. F.; Kohanski, M. A.; Cohen, N. A.; Levin, M.; Damrauer, S. M.
Show abstract
Background Chronic rhinosinusitis (CRS) and nasal polyps (NP) are closely related inflammatory airway diseases, and their co-occurrence is often associated with more persistent symptoms, frequent recurrence, and substantial respiratory morbidity. However, the extent to which CRS without and with NP (CRSsNP and CRSwNP) share genetic susceptibility-and which genetic mechanisms are disease-specific-remains poorly characterized. Methods We conducted cross-population genome-wide association meta-analyses of overall CRS (including both CRSwNP and CRSsNP) and NP (a proxy for CRSwNP) using data from six biobanks. We estimated genome-wide genetic correlations between overall CRS, CRSwNP, and a spectrum of respiratory diseases. We applied five complementary gene-prioritization strategies to nominate CRS- and CRSwNP-associated genes and performed pathway enrichment analyses to infer implicated biological processes. For CRSwNP, we integrated single-cell transcriptomic data to characterize cell-type-specific expression of prioritized genes and used stratified LD score regression to quantify heritability enrichment across immune and epithelial annotations. To delineate shared versus disease-specific genetic signals, we performed three comparative analyses-local genetic correlation, CRSwNP-CRS colocalization, and genomic structural equation modeling. Finally, we performed proteome-wide Mendelian randomization to identify circulating proteins with putative causal effects on CRS and CRSwNP. Results This GWAS meta-analysis identified 96 genome-wide significant loci for CRSwNP and 41 for overall CRS, prioritizing 92 and 39 candidate genes, respectively. CRSwNP and overall CRS showed shared genetic susceptibility (rg = 0.59; P = 6.8e-16), while CRS exhibited broader genetic correlations across multiple respiratory disorders. Pathway analyses consistently implicated immune signaling albeit with disease-specific emphases and lipid-metabolism networks. Single-cell analyses demonstrated distinct expression of CRSwNP-prioritized genes across nasal epithelial and immune cell clusters, and immune annotations explained more CRSwNP heritability (enrichment score = 4.1; P = 0.010) than epithelial annotations (2.5; P = 0.072). Comparative genetic analyses highlighted multiple shared loci-including BACH2, CD247, FADS2, FOXP1, FUT2, GPX4, IL7R, NDFIP1, RAB5B, RORA, SMAD3, TSLP - as well as 3 CRSwNP-specific and 6 CRS-specific loci. Proteome-wide MR identified 10 and 8 putatively causal circulating proteins for CRSwNP and overall CRS, respectively, with protein TNFSF11, IL2RB, and STX4 associated with both conditions. Conclusions This multi-population GWAS meta-analysis expanded genetic discovery for CRS and CRSwNP and showed substantial shared liability with distinct disease-specific components. Immune components explained a larger proportion of CRSwNP heritability than epithelial annotations, reinforcing the primacy of immune-driven mechanisms in polyp disease.
Khattab, A.; Wang, Z.; Srinivasasainagendra, V.; Tiwari, H. K.; Loos, R.; Limdi, N.; Irvin, M. R.
Show abstract
BackgroundDiabetic kidney disease (DKD) is a leading cause of kidney failure in individuals with type 2 diabetes (T2D), yet risk identification in routine clinical practice remains incomplete. A critical and often overlooked barrier is risk observability: how much of a patients underlying risk is actually captured in their clinical record at the time of screening. Existing prediction models evaluate performance using model-specific thresholds, making it difficult to understand how additional data sources alter real-world screening behavior or which individuals benefit when models are expanded. MethodsWe developed a series of five nested machine learning models evaluated at a one-year landmark following T2D diagnosis using data from the All of Us Research Program (N = 39,431; cases = 16,193). Each successive model added a distinct information layer -- intrinsic risk, laboratory snapshots, medication exposure, longitudinal care trajectories, and social determinants of health (SDOH) -- while retaining all prior features. All models were evaluated under a fixed screening policy targeting 90% specificity, so that the false positive rate remained constant as the information available to the model grew. External validation was conducted in the BioMe Biobank (N = 9,818) without retraining. ResultsDiscrimination improved consistently across layers, from AUROC 0.673 (M1) to 0.797 (M5). Under the fixed screening policy, sensitivity nearly doubled from 0.27 to 0.49, with a cumulative recovery of 30.4% of cases missed by the base model. Gains were driven by distinct subgroups at each transition: laboratory features identified biologically high-risk individuals; medication features captured those with high treatment intensity reflecting advanced cardiometabolic burden; longitudinal care trajectory features rescued cases with biological instability observable only through repeated measurements; and SDOH features recovered individuals with limited clinical observability, with rescue probability highest among those with the fewest recorded monitoring domains. Sparse data in the clinical record indicated low observability, not low risk. Social and genetic features each contributed most when downstream physiologic signal was limited, supporting a contextual rather than universal role for each. In BioMe, discrimination was attenuated (M4 AUROC 0.659), but the relative ordering of information layers was fully preserved, and a systematic upward shift in predicted probability distributions underscored the need for recalibration before deployment in a new setting. ConclusionsDKD risk detection in T2D is substantially improved by integrating complementary information layers under a fixed clinical screening policy, with gains arising from distinct domains that identify at-risk individuals in different clinical contexts. The layered landmark framework introduced here reveals how risk observability -- shaped by monitoring intensity, healthcare engagement, and access -- determines what a screening model can detect, and provides a foundation for context-aware EHR-based screening that accounts for data availability at the time of risk assessment. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=140 SRC="FIGDIR/small/26351384v1_ufig1.gif" ALT="Figure 1"> View larger version (51K): org.highwire.dtl.DTLVardef@1cc7f4borg.highwire.dtl.DTLVardef@b92956org.highwire.dtl.DTLVardef@48ffbcorg.highwire.dtl.DTLVardef@8dc627_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOGraphical abstract.C_FLOATNO Study design and layered DKD screening framework The top row defines the cohort timeline, in which predictors are derived from clinical data collected between T2D diagnosis and the 1-year landmark, and incident DKD is ascertained after the landmark. The second row depicts the nested model architecture, in which five successive models sequentially incorporate intrinsic risk, laboratory snapshot features, medication exposure, longitudinal care trajectories, and social determinants of health, while retaining all features from prior layers. The third row summarizes model development in the All of Us Research Program (N = 39,431) and external validation in the BioMe Biobank (N = 9,818), where the same trained models and risk thresholds were applied without retraining. The bottom row highlights the three evaluation domains: predictive performance, fixed-policy screening, and missed-case recovery context. DKD, diabetic kidney disease; T2D, type 2 diabetes; PRS, polygenic risk scores; AUROC, area under the receiver operating characteristic curve; AUPRC, area under the precision-recall curve; PPV, positive predictive value; SHAP, SHapley Additive exPlanations. C_FIG
Lee, H.; Kim, H.
Show abstract
Background: CD276 has been proposed as a candidate gene associated with the biological characteristics of meningioma, but its predictive position and interpretive significance within a transcriptomic classifier have not yet been clearly established. Accordingly, this study aimed to evaluate CD276 stepwise across internal model development, external validation, calibration, decision-analytic assessment, feature stability, and robustness analyses using public transcriptomic cohorts. Methods: The analyses in this study were organized into two interconnected notebooks. In Notebook A, we reconstructed the internal training cohort (GSE183653), evaluated the CD276 single-gene signal, and then developed a transcriptome-wide multigene classifier. We also performed permutation importance, bootstrap confidence interval, label permutation test, repeated cross-validation, CD276 ablation, and internal calibration analyses. In Notebook B, we reproduced the external validation cohort (GSE136661) in a fixed common-gene space, applied train-only recalibration and train-only threshold transfer, and extended the interpretation through decision curve analysis, stability analysis, enrichment analysis, and one-factor-at-a-time robustness analysis. Results: The internal training cohort consisted of 185 samples and 58,830 genes, of which 25 were WHO grade III cases. CD276 expression showed a significant association with WHO grade, but the internal discrimination of the CD276-only baseline was limited (ROC-AUC 0.628, average precision 0.323, balanced accuracy 0.540). In contrast, the initial transcriptome-wide model showed ROC-AUC 0.834 and PR-AUC 0.509, and under 5-fold cross-validation, the canonical fulltranscriptome model and the CD276-forced 5,001-feature branch showed mean ROC-AUC/PR-AUC of 0.854/0.564 and 0.855/0.606, respectively, outperforming the CD276-only baseline at 0.644/0.391. CD276 was not included in the initial 5,000-feature filtered set and ranked 900th among 5,001 features even in the forcibly included 5,001-feature branch. In paired ablation analysis, the performance difference attributable to inclusion of CD276 was effectively close to zero (delta ROCAUC 0.000062, delta PR-AUC 0.000056). Internal calibration analysis showed an overconfident probability pattern (Brier score 0.10501, intercept -1.421392, slope 0.413241). In external validation, the fixed multigene pipeline achieved ROC-AUC 0.928 and PR-AUC 0.335. Train-only recalibration improved calibration metrics while preserving discrimination, and decision curve analysis showed threshold-dependent but limited external utility. Stability analysis showed overlap between core-stable genes and high-impact genes, but CD276 was not supported as a dominant stable core feature and remained in the target-of-interest tier. In robustness analysis, some perturbations preserved the primary interpretation, whereas others revealed transform sensitivity or an alternative high-performing feature-space solution. Conclusions: CD276 is a gene of interest associated with meningioma grade, but it was difficult to interpret it as a strong standalone predictor or a dominant stable classifier feature. In this study, the main basis of predictive performance lay not in CD276 alone but in a broader multigene transcriptomic structure, and probability output needed to be interpreted conservatively with calibration taken into account. These findings position CD276 not as a direct single-gene classifier but as a biologymotivated target-of-interest that should be interpreted within a broader transcriptomic program.
Pinero, S. L.; Li, X.; Lee, S. H.; Liu, L.; Li, J.; Le, T. D.
Show abstract
Long COVID affects millions of people worldwide, yet no disease-modifying treatment has been approved, and existing interventions have shown only modest and inconsistent benefits. A key reason for this limited progress is that current computational drug repurposing pipelines do not match well with the clinical reality of Long COVID. These patients often have persistent, multisystemic symptoms and may already be taking multiple medications, making treatment safety a primary concern. However, most repurposing workflows still treat safety as a downstream filter and rely on disease-associated targets rather than causal drivers. They also assume that the findings of one analysis would generalize across the diverse presentations of Long COVID. We introduce SPLIT, a safety-first repurposing framework that addresses these limitations. SPLIT prioritizes safety at the start of the candidate evaluation, integrates complementary causal inference strategies to identify likely driver genes, and uses a counterfactual substitution design to compare drugs within specific cohort contexts. When applied to cognitive and respiratory Long COVID cohorts, SPLIT revealed three main findings. First, drugs with similar predicted efficacy could have very different predicted safety profiles. Second, the drugs flagged as unfavorable were often different between the two cohorts, showing that drug prioritization is phenotype-specific. Third, SPLIT flagged 18 drugs currently under active investigation in Long COVID trials as having unfavorable predicted profiles. SPLIT provides a practical framework to identify safer, more context-appropriate candidates earlier in the process, supporting more targeted and better-tolerated treatment strategies for Long COVID.
Inoki, Y.; Horinouchi, T.; Sakakibara, N.; Ishiko, S.; Yamamoto, A.; Aoyama, S.; Kimura, Y.; Ichikawa, Y.; Tanaka, Y.; Kondo, A.; Yamamura, T.; Ishimori, S.; Araki, Y.; Asano, T.; Fujimura, J.; Fujinaga, S.; Hamada, R.; Inoue, N.; Kaito, H.; Kiyota, K.; Kobayashi, A.; Kobayashi, Y.; Kumagai, N.; Miyano, H.; Ohtomo, Y.; Sasaki, S.; Suzuki, R.; Washio, M.; Yamada, Y.; Yamasaki, Y.; Yokoyama, T.; Iijima, K.; Nagano, C.; Nozu, K.
Show abstract
Chronic benign proteinuria (PROCHOB), caused by biallelic pathogenic variants in CUBN, presents in childhood as isolated, asymptomatic tubular proteinuria with preserved long-term kidney function. Because its clinical presentation closely mimics early stage glomerular diseases with moderate proteinuria and without increased urinary {beta}2-microglobulin (uBMG) and 1-microglobulin, numerous patients undergo unnecessary kidney biopsies and receive angiotensin-converting enzyme inhibitors or angiotensin II receptor blockers before genetic testing is considered. Using high-throughput aptamer-based urinary proteomics (SomaScan(R)), we identified urinary myoglobin as a disease-specific biomarker for PROCHOB. We developed and confirmed a diagnostic approach in which the urinary myoglobin-to-creatinine (uMB/Cr) ratio robustly distinguishes PROCHOB from other moderate glomerular proteinuric kidney diseases. Although certain cases of Dent disease causing megalin dysfunction exhibit increased urinary myoglobin levels, PROCHOB and Dent disease can be clearly distinguished based on the uBMG-to creatinine ratio. This biomarker reflects impaired proximal tubular protein reabsorption because of cubilin dysfunction and remains normal in healthy individuals or those with typical glomerular diseases with moderate proteinuria. Our findings establish a noninvasive diagnostic tool for PROCHOB that prompts targeted genetic testing for CUBN variants using the uMB/Cr and urinary uBMG-to-creatinine ratios. This strategy has the potential to transform the clinical diagnostic pathway for isolated proteinuria.
Krepel, J.; Binkyte, R.; Kerkouche, R.; Harries, M.; Klett-Tammen, C. J.; Fritz, M.; Kesselheim, S.; Kuehn, M.; Bazarova, A.; Lange, B.
Show abstract
During the COVID-19 pandemic, reported incidence data played a central role in public health surveillance and in tracking epidemic dynamics, although they provide limited insight into the behavioral, immunological, and socioeconomic drivers of transmission.Population-based seroprevalence studies with linked survey data offer a rich but untapped source of individual-level information that can complement routine surveillance. In this study, we investigate whether aggregated seroprevalence cohort data can be leveraged to predict local COVID-19 incidence and to identify interpretable predictors associated with transmission dynamics. Using data from the Multilocal SeroPrevalence (MuSPAD) study in Germany (2020--2022), we trained multiple machine learning models, including least absolute shrinkage and selection operator (LASSO), vector autoregressive models (VAR), multilayer perceptrons (MLPs), and long short-term memory neural networks (LSTMs), to predict location-specific seven-day incidence rates. Feature importance was assessed using regression coefficients where applicable and model-agnostic explainability methods, including Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP). Across model classes, cohort-derived features enabled accurate prediction of local incidence, with time-aware models achieving the strongest performance. Consistent predictors included prior infection and testing history, employment-related changes, vaccination status, and mask-wearing behavior, highlighting the importance of behavioral and reporting-related signals. While differential privacy introduced modest degradation in predictive performance under strict privacy budgets, SHAP-based explanations remained stable, and LIME-based explanations were more sensitive to privacy-induced noise. These results demonstrate that aggregated cohort data encode meaningful and interpretable signals of population-level transmission dynamics. Population-based serosurveys therefore provide a complementary source of information for predicting local COVID-19 incidence and identifying key drivers of transmission beyond routine surveillance data. Our findings show that integrating interpretable machine learning with privacy-aware analysis enables actionable insights from sensitive cohort data, supporting their use in digital epidemiology and informing data-driven public health decision-making.
Schwoebel, J.; Frasch, M.; Spalding, A.; Sewell, E.; Englert, P.; Halpert, B.; Overbay, C.; Semenec, I.; Shor, J.
Show abstract
As health systems begin deploying autonomous AI agents that make independent clinical decisions and take direct actions within care workflows, ensuring patient safety and care quality requires governance standards that go beyond existing medical device frameworks designed for human-in-the-loop prediction tools. This paper introduces the Healthcare AI Agents Regulatory Framework (HAARF), a comprehensive verification standard for autonomous AI systems in clinical environments, developed collaboratively with 40+ international experts spanning regulatory authorities, clinical organizations, and AI security specialists. HAARF synthesizes requirements from nine major regulatory frameworks (FDA, EU AI Act, Health Canada, UK MHRA, NIST AI RMF, WHO GI-AI4H, ISO/IEC 42001, OWASP AISVS, IMDRF GMLP) into eight core verification categories comprising 279 specific requirements across three risk-based implementation levels. The framework addresses critical gaps in health system readiness for autonomous AI including: (1) progressive autonomy governance with clinical accountability, (2) tool-use security for agents that independently access EHRs, medical devices, and clinical systems, (3) continuous equity monitoring and bias mitigation across diverse patient populations, and (4) clinical decision traceability preserving human oversight authority. We validate HAARFs enforcement capabilities through a scenario-based red-team evaluation comprising six adversarial scenarios executed under baseline (no middleware) and HAARF- guardrailed conditions (N = 50 trials each, Gemini 2.5 Flash primary with Claude Sonnet 4.6 cross-model validation). In baseline conditions, the agent model executes unauthorized tools in 56-60% of adversarial trials. Under the HAARF condition, deterministic middleware enforcement reduces the unauthorized-tool success rate to 0%, with 0% contraindication misses and 0% policy-injection success (95% Wilson CI [0.00, 0.07]). Cross-model validation confirms identical security metrics, supporting HAARFs model-agnostic design. Mapping analysis demonstrates 48-88% coverage of major regulatory frameworks, with per-category FDA alignment ranging from 73% (C5, Agent Registration) to 91% (C3, Cybersecurity; C7, Bias & Equity). Initial validation with healthcare organizations shows a 40-60% reduction in multi-jurisdictional compliance burden and improved clinical safety governance outcomes. HAARF provides health systems with a practical, risk-stratified pathway for safe AI agent deployment--shifting from reactive compliance to proactive quality governance while maintaining rigorous patient safety standards and human-centered care principles.
Omar, M.; Agbareia, R.; McGreevy, J.; Zebrowski, A.; Ramaswamy, A.; Gorin, M.; Anato, E. M.; Glicksberg, B. S.; Sakhuja, A.; Charney, A.; Klang, E.; Nadkarni, G.
Show abstract
Large language models are increasingly used for clinical guidance while their parent companies introduce advertising. We tested whether pharmaceutical ads embedded in the prompts of 12 models from OpenAI, Anthropic, and Google shift drug recommendations across 258,660 API calls and four experiments probing distinct epistemic conditions. When two drugs were both guideline appropriate, advertising shifted selection of the advertised drug by +12.7 percentage points (P < 0.001), with some model scenario pairs shifting from 0% to 100%. Google models were the most susceptible (+29.8 pp), followed by OpenAI (+10.9 pp), while Anthropic models showed minimal change (+2.0 pp). When the advertised product lacked evidence or was clinically suboptimal, models resisted. This reveals a structured vulnerability: advertising does not override medical knowledge but fills the space where clinical evidence is underdetermined. An open response sub analysis (2,340 calls across three representative models) confirmed that advertising restructures free-text clinical reasoning: models echoed ad claims at 2.7 times the baseline rate while maintaining high stated confidence and rarely disclosing the ad. Susceptibility was provider dependent (Google: +29.8 pp; OpenAI: +10.9 pp; Anthropic: +2.0 pp). Because this bias operates within clinically correct answers, it is invisible to accuracy based evaluation, identifying a class of AI safety vulnerability that standard testing cannot detect.
Liang, S.; Kim, M. S.; Sui, Y.; Tan, Y.; Li, L.; Cho, S. M.; Koyama, S.; Liu, Y.; Paruchuri, K.; Chan, A.; Honigberg, M.; Natarajan, P.; Chatterjee, N.; Fahed, A. C.; Yu, Z.
Show abstract
Polygenic risk scores (PRSs) are typically validated using population-level metrics, masking variability in individual-level risk prediction and hindering clinical translation. To address this, we introduced a novel framework using a "benchmark" cohort (N=1184) of "unexpected coronary artery disease (CAD)": early-onset patients (<55 years) with a clinical profile of low 10-year risk, no diabetes or severe hypercholesterolemia that excludes therapy indications. The occurrence of early CAD in these clinically low-risk individuals establishes a "ground truth" for high genetic risk. We evaluated 58 published CAD PRSs and demonstrated a disconnection between population-level performance and individual-level accuracy (proportion of benchmark patients captured). The proportion captured by 58 PRSs varied from 10.8% to 33.1%, and the top-performing score was 2-fold more effective at identifying the benchmark group than established non-genetic biomarkers, such as lipoprotein(a). Furthermore, benchmark patients never captured by any score exhibited significantly healthier lipid profiles. Our framework provides an essential method for validating clinical readiness of PRSs.
Kamau, A. F.; Merchant, G. R.; Nakajima, H. H.; Neely, S. T.
Show abstract
Conductive hearing loss (CHL) with a normal otoscopic exam can be difficult to diagnose because routine clinical measures such as audiometric air-bone gaps (ABGs) can identify a conductive component but often cannot distinguish among specific underlying mechanical pathologies (e.g., stapes fixation versus superior canal dehiscence, which may produce similar audiograms). Wideband tympanometry (WBT) is a fast, noninvasive test that can provide additional mechanical information across a broad range of frequencies (200 Hz to 8 kHz). However, WBT metrics are influenced by variations in ear canal geometry and probe placement and can be challenging to interpret clinically. In this study, we extend prior WBT absorbance-based classification work by estimating the middle ear input impedance at the tympanic membrane (ZME), a WBT-derived metric intended to reduce ear canal effects. To estimate ZME, we fit an analog circuit model of the ear canal, middle ear, and inner ear to raw WBT data collected at tympanometric peak pressure (TPP). Data from 27 normal ears, 32 ears with superior canal dehiscence, and 38 ears with stapes fixation were analyzed. A multinomial logistic regression classifier was trained using principal component analysis (retaining 90% variance) and stratified 5-fold cross-validation with regularization. We compared feature sets based on ABGs alone, ABGs combined with absorbance, and ABGs combined with the magnitude of ZME. The combination of ABGs and the magnitude of ZME produced the best performance, achieving an overall accuracy of 85.6% compared to 80.4% for ABGs alone and 78.4% for ABGs combined with absorbance. These results suggest that incorporating model-derived middle ear impedance features with standard audiometric measures (ABGs) can improve automated pathology classification for stapes fixation and superior canal dehiscence.
Bahig, S.; Oughton, M.; Vandesompele, J.; Brukner, I.
Show abstract
In dense urban settings, delays between diagnostic sampling and effective isolation can sustain transmission during peak infectiousness. We define a waiting-window transmission externality arising when infectious individuals remain mobile while awaiting results, formalized as E = N{middle dot}P{middle dot}TR{middle dot}D, where N is daily testing volume, P test positivity, TR transmission during the waiting period, and D turnaround time. Using Monte Carlo simulation and a susceptible-infectious-recovered (SIR) framework, we quantify excess infections per 1,000 tests/day under multiple diagnostic workflows. A surge scenario incorporates positive coupling between TR and D ({rho} = 0.45), reflecting co-occurrence of laboratory saturation and elevated contacts during system stress. Under centralized 48-hour workflows, excess infections reach [~]80 at P = 10% and [~]401 at P = 50%, increasing to [~]628 under surge conditions. In contrast, near-patient rapid testing and home sampling reduce this to [~]5 and [~]25-26, respectively. Workflows that eliminate the waiting window--either through immediate isolation at sampling or through home-based PCR that returns results at the point of collection--effectively collapse the transmission term. These findings identify diagnostic delay as a modifiable driver of epidemic dynamics. Operational redesign of testing workflows, including decentralized sampling and home-based molecular diagnostics, offers a scalable pathway to improve epidemic controllability and reduce inequities in dense urban environments.
Dutta, A.; Guha, P.; Selvarajan, A. V.; Chowdhury, N.; Banerjee, P.; Sarkar Ghosh, S.; Shaw, A. K.; Ganguli, D.; Sunderam, U.; Roy, M. K.; Banerjee, S.; Srinivasan, R.; Roy, P.; Saha, V.; Dutta, A.; GuhaSarkar, D.
Show abstract
Gallbladder cancer (GBC) is a highly lethal malignancy with limited experimental models to study disease biology or evaluate therapeutic responses. Although canonical Wnt activation is commonly used for patient-derived organoid (PDO) development and expansion, gallbladder PDOs has also been generated under Wnt-inhibitory conditions. No comparative assessment has determined how Wnt pathway modulation influences gallbladder PDO development, phenotype or drug response. This study systematically compared the impact of canonical Wnt activation (WNTAct medium containing CHIR99021) versus inhibition (WNTInh medium containing DKK1) on the establishment, propagation, molecular features and therapeutic responses of PDOs generated from malignant or non-malignant gallbladder tissues derived from the same patient. Both media supported successful PDO generation with comparable efficiency, preserving biliary epithelial functions and marker expression. Transcriptomic profiling confirmed selective enrichment of canonical Wnt target genes in PDOs generated in WNTAct cultures. WNTAct conditions enabled markedly superior long-term propagation, whereas WNTInh cultures more consistently retained the dysplastic features in malignant samples. Gemcitabine response assays demonstrated significantly greater drug sensitivity in PDOs grown in WNTAct medium, a phenotype reversible upon media switching but requiring extended adaptation, indicating a dynamic and context-dependent influence of Wnt signaling on chemotherapeutic vulnerability. Collectively, the findings reveal a trade-off between long-term propagation and histological fidelity in gallbladder PDOs and show that Wnt signaling modulates gemcitabine sensitivity in a reversible manner. This comparative framework provides practical guidance for selecting culture conditions for gallbladder PDO based disease modelling and precision oncology applications.
Qi, J.; Zeng, P.
Show abstract
Background: Renal impairment is associated with increased risk of Parkinson's disease (PD) in general populations; however, the renal-PD link within cardiovascular disease (CVD) patients remains unclear through the high comorbidity of renal dysfunction and elevated PD risk among this special population. Objectives: To assess renal function's association, longitudinal trajectories and predictive value for PD specifically within a cardiovascular disease cohort. Methods: Among 29,266 UK Biobank CVD patients, we assessed baseline renal function via creatinine-based (eGFRcr) and cystatin C-based (eGFRcys) estimated glomerular filtration. Multivariable Cox regression analyzed associations with incident PD and all-cause mortality, with wide sensitivity analyses addressing reverse causation/confounding. Nested case-control analysis characterized pre-PD eGFR trajectories over 14 years. We finally evaluated whether renal function improved the PREDICT-PD's predictive ability. Results: Over a median 13.1-year follow-up, 489 incident PD cases and 5,919 deaths occurred. Lower eGFR levels exhibited dose-dependent associations with increased PD risk (eGFRcr: HR=0.87 [0.80~0.95]; eGFRcys: HR=0.90 [0.82~0.99]) and all-cause mortality (eGFRcr: HR=0.77 [0.75~0.79]; eGFRcys: HR=0.64 [0.63~0.66]). Pre-PD eGFR trajectories diverged significantly from controls starting over 14 years before diagnosis. eGFR-defined chronic kidney disease (<60 ml/min/1.73m2) conferred 38~60% higher PD risk and 159~234% elevated mortality risk, and could significantly enhance PREDICT-PD's discrimination, with a 1.18~1.34% increase in prediction accuracy. Conclusions: Impaired renal function is an independent PD and all-cause mortality risk factor of CVD patients, preceded by a slow, progressive eGFR decline starting >14 years before diagnosis. Incorporating renal function substantially improves PD risk prediction in this population.
Hornak, G.; Heinolainen, A.; Solyomvari, K.; Silen, S.; Renkonen, R.; Koskinen, M.
Show abstract
Selecting an effective treatment relies on accurately anticipating patient's response to alternative interventions. However, forecasting longitudinal clinical trajectories remains difficult because electronic health records contain heterogeneous, irregularly sampled data over extended time periods. These issues are especially relevant for laboratory measurements, which are central for diagnostics, assessment of therapeutic responses, and tracking disease progression in routine clinical practice. However, existing deep learning methods for counterfactual prediction usually assume regularly sampled data, an assumption incompatible with the irregular, heterogeneous data-generation processes of real-world clinical practice. Here we present the Time-Aware G-Transformer, which integrates causal G-computation with time-aware attention to predict counterfactual outcomes on irregular data. By explicitly conditioning on the timing of future observations and encoding measurement patterns, the model captures temporal dynamics that previous methods overlook. Evaluated on synthetic tumor growth data and on 90,753 cancer patient trajectories from an academic medical center, our approach demonstrates superior long-horizon (> 1 day) prediction accuracy and uncertainty calibration compared to state-of-the-art baselines. These results demonstrate that embedding temporal relations directly into the attention mechanism enables robust integration of patient history data for evaluating potential treatment strategies in personalized medicine.
Schmidt, C.; Samartsidis, P.; Seaman, S.; Emmanouil, B.; Foster, G.; Reid, L.; Smith, S.; De Angelis, D.
Show abstract
To minimise health disparities, equitable access to medical treatment is paramount. In a pioneering intervention, National Health Service Englands Hepatitis C virus (HCV) programme has implemented country-wide peer support to boost treatment access. Peer support workers (peers) are individuals with relevant lived experience, who promote testing and treatment in marginalised populations underserved by traditional health services. We evaluated the English peers intervention, exploiting its staggered rollout and rich surveillance data between June 2016 and May 2021. Peers increased HCV cases identified by 13{middle dot}9% (95% credible interval (95% CrI) [5{middle dot}3, 21{middle dot}7]), sustained viral responses by 8{middle dot}0% (95% CrI [-4{middle dot}4, 18{middle dot}6]), and drug services referrals by 8{middle dot}8% (95% CrI [-12{middle dot}5, 22{middle dot}6]). The interventions effectiveness was magnified during the first COVID-19 lockdown and individuals supported by peers typically belonged to populations with poor treatment access. Our findings indicate that peers can boost equity in treatment access on a national scale.
LAM, Q. T.; Fan, F.-Y.; Wang, Y.-L.; Wu, C.-Y.; Sun, Y.-S.; Vo, T. T. T.; Kuo, H.; Kha, Q. H.; Le, M. H. N.; Vu, G.; Le, N. Q. K.; Lee, I.-T.
Show abstract
Objectives: Machine learning can predict severe tooth loss (STL, 6 or more missing teeth), but opaque black-box models neglecting complex survey designs limit clinical adoption. This study developed and externally validated an intrinsically interpretable, survey-weighted framework for population-level STL prediction, capturing complex socio-behavioral and systemic health determinants. Methods: We analyzed nationally representative data from BRFSS 2022 (derivation, N=433,772), BRFSS 2024 (temporal validation, N=448,213), and the clinically examined NHANES 2015-2018 (cross-domain validation, N=10,775). Missing data were resolved using an anti-leakage HistGradientBoosting MICE pipeline, preserving multivariate epidemiological variance. An Explainable Boosting Machine (EBM, GA2M) was natively trained by integrating complex survey weights. For external clinical validation, structural domain shift was addressed through non-parametric Isotonic Regression recalibration. Results: The EBM achieved strong temporal stability on BRFSS 2024 (AUC: 0.8627; Brier Score: 0.0845). Upon cross-domain validation against NHANES 2015-2018, the calibrated model demonstrated robust transportability (AUC: 0.7504; Brier Score: 0.1358). Notably, the zero-shot EBM (AUC: 0.7591) closely matched the predictive ceiling of a black-box stacked meta-ensemble (AUC: 0.7706), eliminating the need for unstable post-hoc approximations. Fully auditable shape functions explicitly revealed non-linear risk thresholds and synergistic pairwise interactions for key predictors including age, smoking, income, and diabetes. Decision curve analysis confirmed substantial positive net clinical benefit across a 5%-50% risk threshold continuum. Conclusions: The MICE-EBM framework predicts STL with complete intrinsic transparency and robust probabilistic reliability. By successfully generalizing across unobserved temporal and clinical cohorts, this TRIPOD+AI compliant framework provides a clinically deployable tool to optimize targeted dental public health interventions.
Wang, X.-Y.; Li, M.-M.; Zhao, S.-M.; Jia, X.-Y.; Yang, W.-S.; Chang, L.-L.; Wang, H.-M.; Zhao, J.-T.
Show abstract
Stroke-associated pneumonia (SAP) is a common, severe complication in acute ischemic stroke (AIS) patients receiving bridging therapy (intravenous thrombolysis + mechanical thrombectomy), worsening prognosis and increasing mortality. Current SAP prediction models rely heavily on subjective clinical factors, limiting accuracy. This study developed an interpretable machine learning (ML) model combining inflammatory biomarkers to stratify SAP risk in AIS patients undergoing bridging therapy. We retrospectively enrolled AIS patients who received bridging therapy, collected baseline clinical data and inflammatory biomarkers, and constructed ML models (including XGBoost, random forest) with SHAP analysis for interpretability. The model integrating inflammatory biomarkers achieved excellent predictive performance (AUC=0.XX, 95%CI: XX-XX), outperforming traditional clinical models. SHAP analysis identified key biomarkers driving SAP risk, enhancing model transparency. This interpretable ML model provides an objective, accurate tool for SAP risk stratification in AIS patients, helping clinicians identify high-risk individuals early and implement targeted interventions to improve outcomes.
Sun, S.; Cai, C. X.; Fan, R.; You, S.; Tran, D.; Rao, P. K.; Suchard, M. A.; Wang, Y.; Lee, C. S.; Lee, A. Y.; Zhang, L.
Show abstract
Multimodal learning has the potential to improve clinical prediction by integrating complementary data sources, but the incremental value of imaging beyond structured electronic health record (EHR) data remains unclear in real-world settings. We developed a multimodal survival modeling framework integrating optical coherence tomography (OCT) and EHR data to predict time to visual improvement in patients with diabetic macular edema (DME), and evaluated how different ophthalmic foundation model representations contribute to prognostic performance. In a retrospective cohort of 973 patients (1,450 eyes) receiving anti-vascular endothelial growth factor therapy, we compared multimodal models combining 22,227 EHR variables with 196,402 OCT images, with OCT embeddings derived from three ophthalmic foundation models (RETFound, EyeCLIP, and VisionFM). The EHR-only model showed minimal prognostic discrimination (C-index 0.50 [95% CI, 0.45-0.55]). Incorporating OCT improved performance, with the magnitude of improvement depending on the representation. EHR+RETFound achieved the strongest performance (C-index 0.59 [0.54-0.65]), followed by EHR+EyeCLIP (0.57 [0.52-0.62]) and EHR+VisionFM (0.56 [0.51-0.61]). Multimodal models, particularly EHR+RETFound, demonstrated improved risk stratification with clearer separation of Kaplan-Meier curves. Partial information decomposition revealed that prognostic information was dominated by modality-specific contributions, with OCT and EHR providing largely distinct signals and minimal shared information. The magnitude of OCT-specific contribution varied across foundation models and aligned with observed performance differences. These findings indicate that OCT provides complementary prognostic value beyond structured clinical data, but gains are modest and depend strongly on representation choice. Our results highlight both the promise of multimodal modeling for personalized prognosis and the need for rigorous, context-specific evaluation of foundation models in real-world clinical settings.
Specht, B.; Tayeb, Z. Z.; Garbaya, S.; Khadraoui, D.; EL-Khozondar, M.; Schneider, R.
Show abstract
Accurate inference of physiological state across the menstrual cycle has important applications in reproductive health and in understanding symptom dynamics, yet most non-hormonal approaches rely on wearable sensors or calendar-based tracking. Whether self-reported symptoms alone can support prospective, cross-subject phase classification remains unresolved. Here, we introduce a hybrid modelling framework that combines a gradient-boosted classifier with a Hidden Semi-Markov Model to infer four menstrual cycle phases (menstrual, follicular, fertile, and luteal) from self-reported data. The classifier captures non-linear symptom patterns, while the temporal model imposes biologically grounded constraints, including cyclic ordering and realistic phase durations. In a leave-one-subject-out evaluation using hormonally annotated data from 41 participants, the model achieved 67.6\% accuracy and a macro F1 score of 0.662. Features reflecting short-term symptom variability were more informative than absolute symptom levels, indicating that within-person fluctuation provides a more generalisable signal of cycle phase than symptom intensity alone. These findings demonstrate the feasibility of low-burden, device-free menstrual health monitoring, establish symptom dynamics as a basis for scalable digital biomarkers, and expand access to tracking in resource-constrained settings and populations underserved by wearable-based approaches.
Aiton, E.; Nazzari, V.; Cornish, R. P.; Faber, B. G.; Burden, C.; Birchenall, K.; Borges, M. C.; Lawlor, D. A.
Show abstract
Objective To describe trends in dispensing of monoclonal antibodies (mAbs) for autoimmune conditions during and around pregnancy. Design Descriptive study. Setting Lombardy, Italy between 2012 and 2024. Population All women of reproductive age (14-49 years) resident in Lombardy. Methods We described trends in mAb dispensations among women of reproductive age and the prevalence of mAb dispensing before, during and after pregnancy. We explored maternal factors associated with discontinuation. Main outcome measures Change in prescribing of mAbs over time in all women of reproductive age, and before, during and after pregnancy in those who became pregnant. Prevalence of discontinuation and switching mAbs around pregnancy. Results We included 3,049,175 women of reproductive age and 859,699 pregnancies. Prevalence of mAb dispensing during pregnancy increased over 60-fold over the study period, from 0.0041% (95%CI:0.00084, 0.012) in 2012 to 0.27% (95%CI:0.23, 0.32) in 2024. Pregnancy affected mAb dispensing, with mean prevalence decreasing from 0.080% (95%CI:0.074, 0.087) before pregnancy to 0.051% (95%CI:0.046, 0.057) by the third trimester. Over half (53.3%) of pre-existing users discontinued before or during pregnancy; discontinuation decreased over time, and varied substantially between mAbs. Switching mAbs during pregnancy was rare (3.3%). We found limited evidence that sociodemographic factors were associated with discontinuation, but that some health factors may be, such as use of assisted reproductive technology (OR=1.92, 95%CI:0.98-3.77). Conclusions Italian population-wide data from 2012-2024 show an increase in mAbs dispensed during pregnancy, and fewer instances of discontinuing these drugs over time. This may reflect recent changes in prescribing guidelines for mAbs in pregnancy.